mtp for verl by ArronHZG · Pull Request #62 · ISEEKYAN/mbridge

ArronHZG · 2025-12-11T12:49:06Z

Used to pre-configure relevant MTP settings based on the configuration.

ISEEKYAN · 2026-01-19T09:03:50Z

mbridge/core/safetensor_io.py

    def __init__(self, hf_dir: str):
        index_file = os.path.join(hf_dir, "model.safetensors.index.json")
-        config = AutoConfig.from_pretrained(hf_dir)
+        config = AutoConfig.from_pretrained(hf_dir, trust_remote_code=True)


we can't set trust_remove_code by default

ISEEKYAN · 2026-01-19T09:06:03Z

example/0.load_model_and_generate_single_gpu.py

        hf_model_path, trust_remote_code=trust_remote_code
    )
+
+    if hasattr(config, "num_nextn_predict_layers"):


why do we set this, is this a debug code?

@ArronHZG This should not be needed once the patches here are applied #62 (comment)

HollowMan6 · 2026-02-14T14:48:04Z

example/0.load_model_and_generate_single_gpu.py

@@ -27,9 +27,17 @@ def init_distributed():

 def load_model(hf_model_path, trust_remote_code=False):
    """Load model"""
-    bridge = AutoBridge.from_pretrained(
+
+    # use AutoConfig to change hf config
+    config = AutoConfig.from_pretrained(
        hf_model_path, trust_remote_code=trust_remote_code
    )
+
+    if hasattr(config, "num_nextn_predict_layers"):
+        config.num_nextn_predict_layers = 0
+
+    bridge = AutoBridge.from_config(config)
+


These changes should not be needed

Should MTP be added to the training? My understanding is that it should still be left to the user to decide.

Should MTP be added to the training? My understanding is that it should still be left to the user to decide.

Ah, if it's for this purpose, I think it's better to remove this line instead as this flag controls directly if MTP is enabled on the Megatron side:

mbridge/mbridge/models/mimo.py

Line 29 in ed7d432

mtp_args["mtp_num_layers"] = hf_config.num_nextn_predict_layers

HollowMan6 · 2026-02-14T14:50:52Z

mbridge/models/mimo.py

        # Handle transformer components within MTP
        # Check if this is a transformer_layer component
        if "transformer_layer" in name:
            # Create a proxy name to use with parent class methods
            # Convert mtp.layers.{idx}.transformer_layer.* to decoder.layers.{idx}.*
            proxy_name = name.replace(
                f"mtp.layers.{mtp_layer_idx}.transformer_layer",
                f"decoder.layers.{mtp_layer_idx}",
            )


Need these changes (replace from Line 84 to Line 92 with these code) so that we don't need to disable num_nextn_predict_layers when loading from hf weights (so that MTP weights can be loaded correctly)

Suggested change

# Handle transformer components within MTP. MCore may expose these under

# either "...transformer_layer.*" or "...mtp_model_layer.*".

layer_prefixes = ("transformer_layer", "mtp_model_layer")

proxy_name = None

for layer_prefix in layer_prefixes:

mcore_prefix = f"mtp.layers.{mtp_layer_idx}.{layer_prefix}"

if mcore_prefix in name:

proxy_name = name.replace(

mcore_prefix,

f"decoder.layers.{mtp_layer_idx}",

)

break

if proxy_name is not None:

Should MTP be added to the training? My understanding is that it should still be left to the user to decide.

ArronHZG added 9 commits December 11, 2025 20:48

use_mtp

8d14f00

rm use_mtp

88fa317

use AutoConfig to change hf config

1b18b06

refactor code

059cd00

set mtp_loss_scaling_factor

bbcff60

trust_remote_code=True

2978b69

for transformer >= 5.0.0

6bf2d45

change mimo

3fffba7

Merge branch 'main' into feature/verl_mtp

9a02d0e

ISEEKYAN reviewed Jan 19, 2026

View reviewed changes

HollowMan6 suggested changes Feb 14, 2026

View reviewed changes

HollowMan6 mentioned this pull request Feb 14, 2026

Add MiMo dense MTP models bridge support NVIDIA-NeMo/Megatron-Bridge#2387

Merged

5 tasks

Merge branch 'main' into feature/verl_mtp

445c6e7

ISEEKYAN approved these changes Mar 11, 2026

View reviewed changes

ISEEKYAN merged commit 89be345 into ISEEKYAN:main Mar 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtp for verl#62

mtp for verl#62
ISEEKYAN merged 10 commits intoISEEKYAN:mainfrom
ArronHZG:feature/verl_mtp

ArronHZG commented Dec 11, 2025 •

edited

Loading

Uh oh!

ISEEKYAN Jan 19, 2026

Uh oh!

ISEEKYAN Jan 19, 2026

Uh oh!

HollowMan6 Feb 14, 2026

Uh oh!

HollowMan6 Feb 14, 2026

Uh oh!

ArronHZG Mar 10, 2026

Uh oh!

HollowMan6 Mar 10, 2026

Uh oh!

HollowMan6 Feb 14, 2026

Uh oh!

ArronHZG Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

+        # Handle transformer components within MTP. MCore may expose these under
+        # either "...transformer_layer.*" or "...mtp_model_layer.*".
+        layer_prefixes = ("transformer_layer", "mtp_model_layer")
+        proxy_name = None
+        for layer_prefix in layer_prefixes:
+            mcore_prefix = f"mtp.layers.{mtp_layer_idx}.{layer_prefix}"
+            if mcore_prefix in name:
+                proxy_name = name.replace(
+                    mcore_prefix,
+                    f"decoder.layers.{mtp_layer_idx}",
+                )
+                break
+        if proxy_name is not None:

Conversation

ArronHZG commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ArronHZG commented Dec 11, 2025 •

edited

Loading